Type inference on wikipedia list pages
نویسندگان
چکیده
The extraction of information from Wikipedia has led to a huge amount of knowledge made widely available by projects like the DBpedia. So far, most effort is put into extracting explicitly encoded information e.g. infoboxes. However, Wikipedia also contains a huge amount of implicit knowledge. One example for an untouched source of implicit knowledge are Wikipedia’s List of pages, in which multiple entities with a common type are collected. If this common type is known, it can be added to all entities of the list. Moreover, entities which are part of this list but not yet presented in the DBpedia can be added. This offers a huge potential for extending the DBpedia by adding missing type information. This paper proposes an approach to extract the shared types of a list using statistical methods and natural language processing. For a list entity, it was possible to infer new types with a precision of 86%.
منابع مشابه
Clustering of Wikipedia Pages on Edit Behaviors
We consider the edit history of Wikipedia to perform clustering of the pages. We conjecture that the editors exhibit homophily or high correlation (in terms of the topics of interests). Therefore, it is possible to utilize the edit history to cluster pages having same or closely related topics. We validate our clustering results with the list of categories and the incoming and outgoing links on...
متن کاملRelational Inference for Wikification
Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list ...
متن کاملJoint Bootstrapping of Corpus Annotations and Entity Types
Web search can be enhanced in powerful ways if token spans in Web text are annotated with disambiguated entities from large catalogs like Freebase. Entity annotators need to be trained on sample mention snippets. Wikipedia entities and annotated pages offer high-quality labeled data for training and evaluation. Unfortunately, Wikipedia features only one-ninth the number of entities as Freebase,...
متن کاملReal-time monitoring of sentiment in business related Wikipedia articles
We present an online service with real-time monitoring of Wikipedia pages for companies and detects sentiment with respect to the edits, the companies and editors. It monitors the IRC stream, detects company-related articles using a small hand-built list and performs sentiment analysis using a sentiment-annotated word list. The system generates a report that can be emailed to users.
متن کاملA Graph-Based Approach to Skill Extraction from Text
This paper presents a system that performs skill extraction from text documents. It outputs a list of professional skills that are relevant to a given input text. We argue that the system can be practical for hiring and management of personnel in an organization. We make use of the texts and the hyperlink graph of Wikipedia, as well as a list of professional skills obtained from the LinkedIn so...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016